Oct. 3rd, 2018

About Me

SAS Code

See on Github
libname datum '~/../myshortcuts/SAS/CPS_Project/perf';

data demog;
    set datum.performancetask_demog;
run;
data finstat;
    set datum.performancetask_finstat;
run;

data studentDataRaw;
    merge demog finstat;
    by student_unique_id;
    if grade_level = '09' or grade_level = '10' or grade_level = '11' or grade_level = '12';
    if (leave_date < input('2015-06-01',yymmdd10.) or entry_date > input('2016-06-30',yymmdd10.)) then delete;
    
    /* create empty values */
    numerator=.;
    denominator=.;
    
    /* set inital denominator values */
    if leave_date >= input('2015-06-01',yymmdd10.) then denominator = 1;
    
    /* if still enrolled then include in denominator and exclude from numerator*/
    if leave_date >= input('2016-06-30',yymmdd10.) then denominator = 1;
    if leave_date >= input('2016-06-30',yymmdd10.) then numerator = 0;
    
    /* define denominator values for leave_codes */
    if leave_code = '31' then denominator = 1;   
    else if leave_code = '32' or leave_code = '33' or leave_code = '34' then do;
        if (
          verified_transfer = 'yes' 
          or verified_transfer = 'unknown'
          or leave_date >= input('2016-02-01',yymmdd10.)
        ) and denominator = . then denominator = 0;
        else denominator = 1;
    end;
    else if leave_code = '35' and denominator = . then denominator = 0;
    else if leave_code = '40' and denominator = . then denominator = 0;
    else if leave_code = '41' and denominator = . then denominator = 0;
    else if leave_code = '52' and denominator = . then denominator = 1;
    else if leave_code = '53' and denominator = . then denominator = 1;
    else if leave_code = '55' and denominator = . then denominator = 1;
    else if leave_code = '67' and denominator = . then denominator = 1;
    else if leave_code = '86' and denominator = . then denominator = 1;
    else if leave_code = '87' and denominator = . then denominator = 1;
    else if leave_code = '88' and denominator = . then denominator = 1;
    else if leave_code = '99' and denominator = . then denominator = 1;
    
    /* define numerator values for leave_codes */
    if leave_code = '31' then numerator = 1;   
    else if leave_code = '32' or leave_code = '33' or leave_code = '34' then do;
        if (
        verified_transfer = 'yes' 
        or verified_transfer = 'unknown'
        or leave_date >= input('2016-02-01',yymmdd10.)) and denominator ^= 1 then numerator = 0;
        else numerator = 1;
    end;
    else if leave_code = '35' and numerator =. then numerator = 0;
    else if leave_code = '40' and numerator = . then numerator = 0;
    else if leave_code = '41' and numerator = . then numerator = 0;
    else if leave_code = '52' and numerator = . then numerator = 1;
    else if leave_code = '53' and numerator = . then numerator = 1;
    else if leave_code = '55' and numerator = . then numerator = 0;
    else if leave_code = '67' and numerator = . then numerator = 1;
    else if leave_code = '86' and numerator = . then numerator = 1;
    else if leave_code = '87' and numerator = . then numerator = 1;
    else if leave_code = '88' and numerator = . then numerator = 1;
    else if leave_code = '99' and numerator = . then numerator = 1;
run;

data studentDataErrors;
  set studentDataRaw;
  if denominator = . or numerator = .;
run;


data studentData;
  retain school_unique_id student_unique_id grade_level student_gender leave_code verified_transfer numerator denominator;
  set studentDataRaw (keep=school_unique_id student_unique_id grade_level student_gender leave_code verified_transfer numerator denominator);
  if numerator ^=. and denominator ^= .;
run;

/* create district level dataset */
proc sql;
  create table districtData as
  select
    (
      sum(numerator)/sum(denominator)
    ) as overall_dropout_rate,
    (
      sum((student_gender eq 'FEMALE')*numerator)/sum((student_gender eq 'FEMALE')*denominator)
    ) as female_dropout_rate,
    (
      sum((student_gender eq 'MALE')*numerator)/sum((student_gender eq 'MALE')*denominator)
    ) as male_dropout_rate
  from studentData;
  quit;

/* create school level dataset */
proc sql;
  create table schoolData as
  select 
  school_unique_id,
  (
  sum(numerator)/sum(denominator)
  ) as schoolwide_dropout_rate,
  (
  sum((student_gender eq 'FEMALE')*numerator)/sum((student_gender eq 'FEMALE')*denominator)
  ) as female_dropout_rate,
  (
  sum((student_gender eq 'MALE')*numerator)/sum((student_gender eq 'MALE')*denominator)
  ) as male_dropout_rate
  from studentData
  group by school_unique_id;
  quit;

proc export data=studentDataRaw
   outfile='~/../myshortcuts/SAS/CPS_Project/studentDataRaw.csv'
   dbms=csv
   replace;
run;

proc export data=studentDataErrors
   outfile='~/../myshortcuts/SAS/CPS_Project/studentDataErrors.csv'
   dbms=csv
   replace;
run;

proc export data=districtData
   outfile='~/../myshortcuts/SAS/CPS_Project/districtData.csv'
   dbms=csv
   replace;
run;

proc export data=schoolData
   outfile='~/../myshortcuts/SAS/CPS_Project/schoolData.csv'
   dbms=csv
   replace;
run;   

proc delete data=work.demog work.finstat work.studentDataRaw;

Data files produced by the SAS code:

  • studentData.csv - File containing only data columns requested in the packet.
  • studentDataRaw.csv - File containing all student data and the computed numerator and denominator values.
  • studentErrors.csv - File containing all students who were not assigned a numerator or denominator by our SAS code.
  • districtData.csv - File containing our drop-out rate computations for the entire district.
  • schoolData.csv - File containing our drop-out rate computations for each school.

Data Quality

Tools used for Data QA

To check the quality of the data, I will be using R. To check the data for errors, we will be using the studentDataRaw data set generated by our SAS code. We will refer to it simply as studentData.

studentData <- read.csv("studentDataRaw.csv")
schoolData <- read.csv('schoolData.csv')
districtData <- read.csv('districtData.csv')
errors <- read.csv('studentDataErrors.csv')

For data quality check, we

  1. Chose a handful of students with different leave_code and verify numerator/denominator indicators by hand.
  2. Analyze the studentDataErrors dataset generated by our SAS script which contains 29 errors.
  3. Double check our schoolData calculations to see if they make sense and if there are any errors.

In this presentation we carry out bullets 2 and 3. We also discuss ways we may consider in improving our metric.

studentDataErrors

There are 29 students in the error file generated by our SAS code. Below is a summary of the data.

##  STUDENT_UNIQUE_ID SCHOOL_UNIQUE_ID  grade_level    STUDENT_GENDER
##  244021956: 1      1440252:10       Min.   : 9.00   FEMALE:14     
##  248634831: 1      1440079: 5       1st Qu.:10.00   MALE  :15     
##  (Other)  :27      (Other):14       Median :12.00                 
##                                     Mean   :11.21                 
##                                     3rd Qu.:12.00                 
##                                     Max.   :12.00                 
##    leave_date         leave_code verified_transfer numerator     
##  Min.   :2015-08-14   L:29       N/A:29            Mode:logical  
##  1st Qu.:2015-09-22                                NA's:29       
##  Median :2016-03-10                                              
##  Mean   :2016-01-31                                              
##  3rd Qu.:2016-05-13                                              
##  Max.   :2016-06-28                                              
##   denominator
##  Min.   :1   
##  1st Qu.:1   
##  Median :1   
##  Mean   :1   
##  3rd Qu.:1   
##  Max.   :1

Observations

By analyzing the summary table, we observe the following:

  • Each of these students has leave_code = L and verified_transfer = 'N/A', as well as denominator = 1. Our SAS code is not programmed to deal with this leave code.
  • The range of leave_date is from Aug 14 2015 to Jun 28 2016. This means that these students left during the school year.

Investigation

We observed that leave_code = L and verified_transfer = 'N/A'. Based on the contingency table below, we see that there are other rows of data which meet this criteria though were assigned a numerator:

verified_transfer
leave_code
31 32 33 34 35 40 41 52 55 67 86 87 99 L N/
N/A 71212 1204 0 0 0 783 265 77 5 3125 266 3502 1956 353 23991 9764
NO 0 0 908 723 25 0 0 0 0 0 0 0 0 0 0 0
UNKNOWN 0 0 22 223 0 0 0 0 0 0 0 0 0 0 0 0
YES 0 0 720 4456 13 0 0 0 0 0 0 0 0 0 0 0

The range of leave_date for the students who did not receive a numerator is August 14 2015 to June 28 2016, which means these students left during the school year. Is this significant? Below is a frequency chart of leave_date for all students with leave_code=L and verified_transfer = N/A.

The rest of the students with the same leave code and verified_transfer have a leave date between July 01 2016 and July 01 2018

Possible Solutions

Possible solutions are:

  • Contact schools for further information about the student.
  • Edit metric definition/code system to reflect the situation these students fall under.

schoolData

Below shows a histogram of the School-wide drop-out rate computed by our SAS code.

65 schools have a drop-out rate equal to 1. Why?

Further investigation is needed to determine why these schools were rated so high. We leave it as an exercise for the reader.

school_unique_id schoolwide_dropout_rate female_dropout_rate male_dropout_rate number.of.students
1440097 1 NA 1 1
1440176 1 1 1 4
1440216 1 1 1 5
1440266 1 NA 1 1
1440511 1 1 1 13
2195294 1 1 1 4

QUESTION FROM SCHOOL

Question

"John Smith (Student Unique ID 256595825) left our school to transfer to another Chicago Public School. We can see on the student system that, John enrolled at Potter High School (School Unique ID 2198052) on 10/14/2016 and remains enrolled. Why was he counted as a dropout in the 2015-2016 dropout rate?"

Answer

Below is John Smith's data from our SAS script.

STUDENT_UNIQUE_ID SCHOOL_UNIQUE_ID leave_code leave_date verified_transfer numerator denominator
4449 256595825 2195093 31 2016-05-24 N/A 1 1

John Smith has a leave code of 31, which tells us he transferred to another school. John is listed as having not enrolled (transfer status is N/A) in a new school by June 30th, 2016, which is why he is included in the metric.

Changes to the metric

Something to consider when defining metrics is how the metric varies over the population sample. Interact with the graph below by clicking/double clicking on the legend.

Fin